Machine
Learning

Movie Rating Prediction
(using spark-ML)

We built a Machine Learning model to predict the average rating of a movie based on some key features:
Director name
Writer Name
Run Time of the Movie
Genre of the Movie
Year of Release

Why Linear Regression Model ?

It is a well known model for machine learning and gives simple representations.
Gives a Linear relationship between a dependent variable and one or more independent variables.
A scatter plot of our features and label shows that there is a close relationship between two variables.

Architecture

Architecture

Pipeline

Pipeline

Model Summary

MAE (Mean Absolute Error) is a measure of difference between two continuous variables : .051903
MSE (Mean Squared Error) is a quality measure for the estimator : 0.135875
RMSE is a measure of the average deviation of the estimates from the observed values (closer to zero is better) : 0.227823
r2 (R-squared) is a statistical measure of how close the data are to the fitted regression line : 0.955105
DOF (Degree Of Freedom) is the number of independent pieces of information that go into the estimate of a parameter : 2261.000000
Intercept: 6.53804863727
numIterations: 101

Result



LabelPrediction scatterPlot
Comparison
Test Residual
Train Residual